Online blind source separation

ABSTRACT

A method for blind source separation of sources detects the sources using sensors to obtain representative data. The data is then represented by two mixtures having estimates of amplitude and delay mixing parameters. The estimates are updated, including: calculating error measures, each indicating a closeness of the estimates for a given source to a given time-frequency point in the mixtures; and revising the estimates, based on the error measures. The mixtures are filtered to obtain estimates of the sources, including: selecting one error measure having a smallest value in relation to any other error measures, for each time-frequency point in the mixtures; and leaving unaltered any time-frequency points for which a given error measure has the smallest value, while setting to zero any other time-frequency points for which the given error measure does not have the smallest value, for each error measure. The estimates of the sources are output.

BACKGROUND

[0001] 1. Technical Field

[0002] The present invention generally relates to separating signalsources and, in particular, to online blind separation of multiplesources.

[0003] 2. Background Description

[0004] The separation of independent sources from an array of sensors isa classic but difficult problem in signal processing. Generally, thesignal sources as well as their mixture characteristics are unknown.Without knowledge of the signal sources, other than a general assumptionthat the sources are independent, the signal processing is commonlyknown in the art as the “blind separation of sources”. The separation is“blind” because nothing is assumed about the independent source signals,nor about the mixing process.

[0005] A typical example of the blind separation of source signals iswhere the source signals are sounds generated by two independentsources, such as two (or more) separate speakers. An equal number ofmicrophones (two in this example) are used to produce mixed signals,each composed as a weighted sum of the source signals. Each of thesource signals is delayed and attenuated in some unknown amount duringpassage from the speaker to a microphone, where it is mixed with thedelayed and attenuated components of the other source signals.Multi-path signals, generated by multiple reflections of the sourcesignals, are further mixed with direct source signals. This is generallyknown as the “cocktail party” problem, since a person generally wishesto listen to a single sound source while filtering out other interferingsources, including multi-path signals.

[0006] According to the prior art, a blind source separation techniquethat allows the separation of an arbitrary number of sources from justtwo mixtures provided the time-frequency representations of sources donot overlap is described by Jouijine et al., in “Blind Separation ofDisjoint Orthogonal Signals: Demixing N Sources from 2 Mixtures”, inProceedings of the 2000 IEEE International Conference on Acoustics,Speech, and Signal Processing, Istanbul, Turkey, June 2000, vol. 5, pp.2985-88, June 2000. This technique is hereinafter referred to as the“original DUET algorithm”. The key observation in the technique is that,for mixtures of such sources, each time-frequency point depends on atmost one source and its associated mixing parameters. In anechoicenvironments, it is possible to extract the estimates of the mixingparameters from the ratio of the time-frequency representations of themixtures. These estimates cluster around the true mixing parameters and,identifying the clusters, one can partition the time-frequencyrepresentation of the mixtures to provide the time-frequencyrepresentations of the original sources.

[0007] The original DUET algorithm involved creating a two-dimensional(weighted) histogram of the relative amplitude and delay estimates,finding the peaks in the histogram, and then associating eachtime-frequency point in the mixture with one peak. The originalimplementation of the method was offline and passed through the datatwice; one time to create the histogram and a second time to demix.

[0008] Accordingly, it would be desirable and highly advantageous tohave an online method for performing blind source separation of multiplesources. Moreover, it would be further desirable and highly advantageousto have such a method that does not require the creation and updating ofa histogram or the locating of peaks in the histogram.

SUMMARY OF THE INVENTION

[0009] The problems stated above, as well as other related problems ofthe prior art, are solved by the present invention, a method for onlineblind separation of multiple sources.

[0010] The present invention provides an online version of the DUETalgorithm that avoids the need for the creation of the histogram, whichin turn avoids the computational load of updating the histogram and thetricky issue of finding and tracking peaks. The advantages of thepresent invention over the prior art, in particular, the original DUETalgorithm include: online (5 times faster than real time); 15 dB averageseparation for anechoic mixtures; 5 dB average separation for echoicmixtures; and can demix two or more sources from 2 mixtures.

[0011] According to an aspect of the present invention, there isprovided a method for blind source separation of multiple sources. Themultiple sources are detected using an array of sensors to obtain datarepresentative of the multiple sources. The data is represented by twomixtures having estimates of amplitude and delay mixing parameters. Theestimates of amplitude and delay mixing parameters are updated,comprising the steps of: calculating a plurality of error measures, eachof the plurality of error measures indicating a closeness of theestimates of amplitude and delay mixing parameters for a given source toa given time-frequency point in the two mixtures; and revising theestimates of amplitude and delay mixing parameters, based on theplurality of error measures. The two mixtures are filtered to obtainestimates of the multiple sources, comprising the steps of: selectingone of the plurality of error measures having a smallest value inrelation to any other of the plurality of error measures, for each of aplurality of time-frequency points in the mixtures; and leavingunaltered any of the plurality of time-frequency points in the mixturesfor which a given one of the plurality of error measures has thesmallest value, while setting to zero any other of the plurality oftime-frequency points in the mixtures for which the given one of theplurality of error measures does not have the smallest value, for eachof the plurality of error measures. The estimates of the multiplesources are output.

[0012] These and other aspects, features and advantages of the presentinvention will become apparent from the following detailed descriptionof preferred embodiments, which is to be read in connection with theaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0013]FIG. 1 is a block diagram of a computer processing system 100 towhich the present invention may be applied according to an illustrativeembodiment thereof;

[0014]FIG. 2 is a flow diagram illustrating a method for blind sourceseparation of multiple sources, according to an illustrative embodimentof the present invention;

[0015]FIG. 3 is a flow diagram illustrating step 210 of the method ofFIG. 2, according to an illustrative embodiment of the presentinvention;

[0016]FIG. 4 is a flow diagram illustrating step 240 of the method ofFIG. 2, according to an illustrative embodiment of the presentinvention;

[0017]FIG. 5 is a flow diagram illustrating step 250 of the method ofFIG. 2, according to an illustrative embodiment of the presentinvention;

[0018]FIG. 6 is a flow diagram illustrating step 270 of the method ofFIG. 2, according to an illustrative embodiment of the presentinvention;

[0019]FIG. 7 is a diagram illustrating a test setup for blind sourceseparation on anechoic data, according to an illustrative embodiment ofthe present invention;

[0020]FIG. 8 is a diagram illustrating a comparison of overallseparation SNR gain by angle difference for the anechoic data, accordingto an illustrative embodiment of the present invention;

[0021]FIG. 9 is a diagram illustrating the overall separation SNR gainby 30 degree angle pairing for the anechoic data, according to anillustrative embodiment of the present invention;

[0022]FIG. 10 is a diagram illustrating a comparison of overallseparation SNR gain by angle difference, using echoic office data in avoice versus noise comparison, according to an illustrative embodimentof the present invention;

[0023]FIG. 11 is a diagram illustrating separation results for pairwisemixtures of voices, according to an illustrative embodiment of thepresent invention; and

[0024]FIG. 12 is a diagram illustrating W-disjoint orthogonality forvarious sources, according to an illustrative embodiment of the presentinvention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0025] The present invention is directed to online blind separation ofmultiple sources. It is to be understood that the present invention maybe implemented in various forms of hardware, software, firmware, specialpurpose processors, or a combination thereof. Preferably, the presentinvention is implemented as a combination of hardware and software.Moreover, the software is preferably implemented as an applicationprogram tangibly embodied on a program storage device. The applicationprogram may be uploaded to, and executed by, a machine comprising anysuitable architecture. Preferably, the machine is implemented on acomputer platform having hardware such as one or more central processingunits (CPU), a random access memory (RAM), and input/output (I/O)interface(s). The computer platform also includes an operating systemand microinstruction code. The various processes and functions describedherein may either be part of the microinstruction code or part of theapplication program (or a a combination thereof) that is executed viathe operating system. In addition, various other peripheral devices maybe connected to the computer platform such as an additional data storagedevice and a printing device.

[0026] It is to be further understood that, because some of theconstituent system components and method steps depicted in theaccompanying Figures are preferably implemented in software, the actualconnections between the system components (or the process steps) maydiffer depending upon the manner in which the present invention isprogrammed. Given the teachings herein, one of ordinary skill in therelated art will be able to contemplate these and similarimplementations or configurations of the present invention.

[0027]FIG. 1 is a block diagram of a computer processing system 100 towhich the present invention may be applied according to an illustrativeembodiment thereof. The computer processing system 100 includes at leastone processor (CPU) 102 operatively coupled to other components via asystem bus 104. A read only memory (ROM) 106, a random access memory(RAM) 108, a display adapter 110, an I/O adapter 112, and a userinterface adapter 114 are operatively coupled to the system bus 104.

[0028] A display device 116 is operatively coupled to the system bus 104by the display adapter 110. A disk storage device (e.g., a magnetic oroptical disk storage device) 118 is operatively coupled to the systembus 104 by the I/O adapter 112.

[0029] A mouse 120 and keyboard 122 are operatively coupled to thesystem bus 104 by the user interface adapter 114. The mouse 120 andkeyboard 122 may be used to input/output information to/from thecomputer processing system 100.

[0030] The present invention will now be described generally withrespect to FIGS. 2-6. Subsequent thereto, more detailed descriptions ofvarious aspects of the present invention are provided. It is to be notedthat while any equations provided with respect to FIGS. 2-6 may be soprovided out of order; however, in the subsequent detailed description,the equations and corresponding text are provided sequentially in orderof performance according to a preferred embodiment of the presentinvention. Given the teachings of the present invention provided herein,one of ordinary skill in the related art will readily contemplatevariations of the sequence and actual details of the equations andcorresponding steps described herein while maintaining the spirit andscope of the present invention.

[0031]FIG. 2 is a flow diagram illustrating a method for blind sourceseparation of multiple sources, according to an illustrative embodimentof the present invention. The multiple sources are detected using anarray of sensors to obtain two mixtures corresponding to the multiplesources (step 210). A frequency domain representation is computed forthe two mixtures (step 220).

[0032] Subsequent to step 220, the method proceeds to step 240 to updatemixing parameters of the two mixtures, and then the method proceeds tostep 250 and filters the two mixtures.

[0033] Subsequent to step 250, a time domain representation is computedfor the two mixtures (step 260). Estimates of the (original) multiplesources are output (step 270).

[0034] It is to be appreciated that the steps directed to computing thetime domain and frequency domain representations (steps 220 and 260,respectively) are readily performed by one of ordinary skill in therelated art. Nonetheless, for further detail on such computations, thereader is referred to Deller et al., “Discrete-Time Processing of SpeechSignals”, IEEE Press, pubs., 2000. The other steps are described infurther detail below with respect to FIGS. 3-6.

[0035]FIG. 3 is a flow diagram illustrating step 210 (obtain mixtures)of the method of FIG. 2, according to an illustrative embodiment of thepresent invention. The multiple sources to be separated are mixed toobtain two mixtures x1, x2 (step 310), expressed as follows:$\begin{matrix}{{{x_{1}(t)} = {\sum\limits_{j = 1}^{N}{s_{j}(t)}}},} & (1) \\{{{x_{2}(t)} = {\sum\limits_{j = 1}^{N}{a_{j}{s_{j}\left( {t - \delta_{j}} \right)}}}},} & (2)\end{matrix}$

[0036] where N is a number of the multiple sources; δ_(j) is an arrivaldelay between the array of sensors resulting from an angle of arrival;α_(j) is a relative attenuation factor corresponding to a ratio ofattenuations of paths between the multiple sources and the array ofsensors; s_(j) (t) is a j^(th) source; j is a source index ranging from1 to N, where N is a number of the multiple sources; and t is a timeargument. We use Δ to denote the maximal possible delay between sensors,and thus, |δ_(j)|≦Δ, ∀j.

[0037] It is to be appreciated that an “array of sensors” is referred toherein with respect to detecting multiple sources. The array of sensorsmay include 2 (a pair) or more sensors.

[0038]FIG. 4 is a flow diagram illustrating step 240 (update mixingparameters) of the method of FIG. 2, according to an illustrativeembodiment of the present invention. In general, the method of FIG. 4updates the amplitude and delay estimates so that they better explainthe measured data. The input information to the method is thetime-frequency representations of the mixtures.

[0039] Upon receiving the k^(th) block of data corresponding to thefrequency domain representation of a window of data centered atτ_(k)=kτ_(Δ) where τ_(Δ) is the time separating adjacent windows of thefirst mixture x₁ and the second mixture x₂, p(α_(j),δ_(j),ω,τ_(k)),which represents a distance measure for how well the current guesses asto what are the mixing parameters of the multiple sources are in lightof the current data, is computed for each j=1, . . . ,N as follows (step410): $\begin{matrix}{{p\left( {a_{j},\delta_{j},\omega,\tau_{k}} \right)} = {\frac{1}{1 + a_{j}^{2}}{{{{X_{1}\left( {\omega,\tau_{k}} \right)}a_{j}^{- {\omega\delta}_{j}}} - {X_{2}\left( {\omega,\tau_{k}} \right)}}}^{2}}} & (10)\end{matrix}$

[0040] where α_(j) and δ_(j) are current estimates of amplitude anddelay mixing parameters, respectively; X₁(ω,τ_(k)) and X₂(ω,τ_(k)) aretime-frequency representations of a first mixture and a second mixtureof the two mixtures, respectively; k is a current time index; τ_(k) is atime argument corresponding to a k^(th) time index; ω is a frequencyargument; and j is a source index ranging from 1 to N, where N is anumber of the multiple sources.

[0041] It is to be appreciated that p(α_(j),δ_(j),ω,τ_(k)) is an errormeasure that calculates how much a given time-frequency point in themixtures is explained by a particular guess of the j^(th) source'smixing parameters. The closer that p(α_(j),δ_(j),ω,τ_(k)) is to zero,the better it explains the time-frequency content of the mixtures, andthe more likely the particular guess is the correct guess.

[0042] It is to be noted that each p(α_(j),δ_(j),ω,τ_(k)) term (whereinj=1, . . . ,N) depends on α_(j) and δ_(j), the current estimates of therelative amplitude and delay parameters, respectively. Moreover, eachp(α_(j),δ_(j),ω,τ_(k)) term measures a distance assessing how well thedata matches the j^(th) mixing parameter estimate. The smaller thedistance, the better the j^(th) pair of mixing parameters explains thecorresponding time-frequency point in the mixtures. Thep(α_(j),δ_(j),ω,τ_(k)) play an important role in the updating of theamplitude and delay estimates, both of which are computed in two steps.First, we estimate by how much and in which direction we should changeeach estimate, this is the calculation in Equations (19) and (18) foramplitude and delay, respectively. The better the data is explained by agiven amplitude-delay estimate, the larger the effect of that data onthe resulting change calculation. Second, the new amplitude and delayestimates are calculated via Equations (20) and (21), respectively. Thedirection of the change has been calculated in Equations (19) and (18)and the magnitude of the change is scaled by a learning rate constantbeta and a variable learning rate calculated using Equations (22) and(23). The variable learning rate is calculated such that estimates thatare explaining more data that they previously did have a higher learningrate and estimates that are explaining less data that they previouslydid have a lower learning rate. Estimates that explain roughly the sameamount of data over time (that is, not an increasing or decreasingamount) have roughly a constant learning rate.

[0043] Upon computing p(α_(j),δ_(j),ω,τ_(k)), the following are computed$\frac{\partial{J\left( \tau_{k} \right)}}{\partial a_{j}}$

[0044] (step 420), q_(j)[k] (step 430), and$\frac{\partial{J\left( \tau_{k} \right)}}{\partial\delta_{j}}$

[0045] (step 440).$\frac{\partial{J\left( \tau_{k} \right)}}{\partial a_{j}},$

[0046] which represents the direction and magnitude of change in thecurrent estimate of the j^(th) source's amplitude mixing parametercausing the greatest change in how well the amplitude estimate describes(corresponds to) the data, is computed as follows (step 420):$\begin{matrix}{{\frac{\partial{J\left( \tau_{k} \right)}}{\partial a_{j}} = {\sum\limits_{\omega}\quad {\frac{^{{- \lambda}\quad {p{({a_{j},\delta_{j},\omega,\tau_{k}})}}}}{\sum\limits_{l}^{{- \lambda}\quad {p{({a_{l},\delta_{l},\omega,\tau_{k}})}}}}\frac{2}{\left( {1 + a_{j}^{2}} \right)^{2}}}}}\left( \left( {{\left( {a_{j}^{2} - 1} \right){Re}\left\{ {{X_{1}\left( {\omega,\tau_{k}} \right)}\overset{\_}{X_{2}\left( {\omega,\tau_{k}} \right)}^{{- }\quad {\omega\delta}_{j}}} \right\}} + {a_{j}\left( {{{X_{1}\left( {\omega,\tau_{k}} \right)}}^{2} + {{X_{2}\left( {\omega,\tau_{k}} \right)}}^{2}} \right)}} \right) \right.} & (19)\end{matrix}$

[0047] where k is a current time index; τ_(k) is a time argumentcorresponding to a k^(th) time index; α_(j) is the current estimate ofamplitude mixing parameter for the j^(th) source; δ_(j) is a currentestimate of amplitude mixing parameter for the j^(th) source;X₁(ω,τ_(k)) and X₂(ω,τ_(k)) are time-frequency representations of afirst mixture and a second mixture of the two mixtures, respectively; ωis a frequency argument; j and l are source indexes ranging from 1 to N,where N is a number of the multiple sources; p(α_(j),δ_(j),ω,τ_(k)) isan error measure for the j^(th) source; λ is a smoothness parameter; andRe is a function that returns a real part of a complex number.

[0048] q_(j)[k], which represents the amount of mixture energy which isexplained (defined) by the j^(th) source's amplitude and delay mixingparameters, is computed as follows (step 430): $\begin{matrix}{{q_{j}\lbrack k\rbrack} = {\sum\limits_{\omega}\quad {\frac{^{{- \lambda}\quad {p{({a_{j},\delta_{j},\omega,\tau_{k}})}}}}{\sum\limits_{l}^{{- \lambda}\quad {p{({a_{l},\delta_{l},\omega,\tau_{k}})}}}}{{X_{1}\left( {\omega,\tau_{k}} \right)}}\quad {{X_{2}\left( {\omega,\tau_{k}} \right)}}}}} & (22)\end{matrix}$

[0049] where k is a current time index; τ_(k) is a time argumentcorresponding to a k^(th) time index; α_(j) and δ_(j) are currentestimates of amplitude and delay mixing parameters, respectively for thej^(th) source; X₁(ω,τ_(k)) and X₂(ω,τ_(k)) are time-frequencyrepresentations of a first mixture and a second mixture of the twomixtures, respectively; ω is a frequency argument; j and l are sourceindexes ranging from 1 to N, where N is a number of the multiplesources; and λ is a smoothness parameter.$\frac{\partial{J\left( \tau_{k} \right)}}{\partial\delta_{j}},$

[0050] which represents the direction and magnitude of change in thecurrent estimate of the j^(th) source's delay parameter causing thegreatest change in how well the delay estimate describes (correspondsto) the data, is computed as follows (step 440): $\begin{matrix}{\frac{\partial{J\left( \tau_{k} \right)}}{\partial\delta_{j}} = {\sum\limits_{\omega}{\frac{^{{- \lambda}\quad {p{({a_{j},\delta_{j},\omega,\tau_{k}})}}}}{\sum\limits_{l}^{{- \lambda}\quad {p{({a_{l},\delta_{l},\omega,\tau_{k}})}}}}\quad \frac{{- 2}\omega \quad a_{j}}{1 + a_{j}^{2}}{Im}\left\{ {{X_{1}\left( {\omega,\tau_{k}} \right)}\overset{\_}{X_{2}\left( {\omega,\tau_{k}} \right)}^{{- }\quad {\omega\delta}_{j}}} \right\}}}} & (18)\end{matrix}$

[0051] where k is a current time index; τ_(k) is a time argumentcorresponding to a k^(th) time index; α_(j) is a current estimate ofamplitude mixing parameter for the j^(th) source; δ_(j) is the currentestimate of delay mixing parameter for the j^(th) source; X₁((ω,τ_(k))and X₂(ω,τ_(k)) are time-frequency representations of a first mixtureand a second mixture of the two mixtures, respectively; ω is a frequencyargument; j and l are source indexes ranging from 1 to N, where N is anumber of the multiple sources; p(α_(j),δ_(j),ω,τ_(k)) is an errormeasure for the j^(th) source; λ is a smoothness parameter; and Im is afunction that returns an imaginary part of a complex number.

[0052] Subsequent to step 430, α_(j)[k], which represents the timedependent parameter (variable) learning rate, is computed as follows(step 450): $\begin{matrix}{{\alpha_{j}\lbrack k\rbrack} = \frac{q_{j}\lbrack k\rbrack}{\sum\limits_{m = 0}^{k}{\gamma^{k - m}{q_{j}\lbrack m\rbrack}}}} & (23)\end{matrix}$

[0053] where q_(j)[k] represents an amount of mixture energy that isdefined by estimates of amplitude and delay mixing parameters for aj^(th) source; γ is a forgetting factor; m is a time index ranging from0 to a current time index k; and j is a source index ranging from 1 toN, where N is a number of the multiple sources.

[0054] Upon computing$\frac{\partial{J\left( \tau_{k} \right)}}{\partial a_{j}}$

[0055] and α_(j)[k], mixing parameter estimate α_(j)[k] is updated asfollows (step 460): $\begin{matrix}{{a_{j}\lbrack k\rbrack} = {{a_{j}\left\lbrack {k - 1} \right\rbrack} - {\beta \quad {a_{j}\lbrack k\rbrack}\frac{\partial{J\left( \tau_{k} \right)}}{\partial a_{j}}}}} & (20)\end{matrix}$

[0056] Upon computing$\frac{\partial{J\left( \tau_{k} \right)}}{\partial\delta_{j}}$

[0057] and α_(j)[k], mixing parameter estimate δ_(j)[k] is updated asfollows (step 470): $\begin{matrix}{{\delta_{j}\lbrack k\rbrack} = {{\delta_{j}\left\lbrack {k - 1} \right\rbrack} - {\beta \quad {\delta_{j}\lbrack k\rbrack}\frac{\partial{J\left( \tau_{k} \right)}}{\partial\delta_{j}}}}} & (21)\end{matrix}$

[0058] where, for steps 460 and 470, a_(j)[k] and δ_(j)[k] are theestimates of amplitude and delay mixing parameters for a j^(th) sourceat a time index k, respectively; β is a learning rate constant;$\frac{\partial{J\left( \tau_{k} \right)}}{\partial a_{j}}$

[0059] represents the magnitude and the direction of change in a currentestimate of amplitude mixing parameter for a j^(th) source that causes alargest change in correspondence between the current estimate and thedata; $\frac{\partial{J\left( \tau_{k} \right)}}{\partial\delta_{j}}$

[0060] represents the magnitude and the direction of change in a currentestimate of delay mixing parameter for a j^(th) source that causes alargest change in correspondence between the current estimate and thedata; τ_(k) is a current time; k is a current time index; and j is asource index ranging from 1 to N, wherein N is a number of the multiplesources.

[0061]FIG. 5 is a flow diagram illustrating step 250 (filter) of themethod of FIG. 2, according to an illustrative embodiment of the presentinvention.

[0062] Using p(α_(j),δ_(j),ω,τ_(k)) time-frequency masks are computedfor the estimation of the j^(th) source as follows (step 510):$\begin{matrix}{{\Omega_{j}\left( {\omega,\tau_{k}} \right)} = \left\{ \begin{matrix}1 & {{p\left( {a_{j},\delta_{j},\omega,\tau_{k}} \right)} \leq {{p\left( {a_{m},\delta_{m},\omega,\tau_{k}} \right)}\quad {\forall{m \neq j}}}} \\0 & {otherwise}\end{matrix} \right.} & (24)\end{matrix}$

[0063] where Ω_(j)(ω,τ_(k)) is a time-frequency mask;p(α_(j),δ_(j),ω,τ_(k)) is an error measure for a j^(th) source; j and mare source indexes ranging from 1 to N, where N is a number of themultiple sources; τ_(k) is a current time; k is a current time index;α_(j) and δ_(j) are current estimates of amplitude and delay mixingparameters for the j^(th) source, respectively; and ω is a frequencyargument.

[0064] It is to be appreciated that, as noted above, the closerp(α_(j),δ_(j),ω,τ_(k)) is to zero, the better it explains thetime-frequency content of the mixtures, and the more likely theparticular guess is the correct guess. Thus, for each time-frequencypoint, one of the guesses ha to be the correct one, so we choose thesmallest value of p(α_(j),δ_(j),ω,τ_(k)) as it best explains the data;this is the selection done in Equation (24).

[0065] Using the time-frequency masks, the first mixture x₁ and thesecond mixture x₂ are filtered to obtain estimates of the (original)multiple sources as follows (step 520):

S _(j)(ω,τ_(k))=Ω_(j)(ω,τ_(k))X ₁(ω,τ_(k))  (25)

[0066] where S_(j)(ω,τ_(k)) is an estimate of a time-frequencyrepresentation of a j^(th) source; j is a source index ranging from 1 toN, where N is a number of the multiple sources; Ω_(j)(ω,τ_(k)) is atime-frequency mask; X₁(ω,τ_(k)) is a time-frequency representation of afirst mixture of the two mixtures; ω is a frequency argument; τ_(k) is acurrent time; k is a current time index.

[0067] Thus, it is to be appreciated that at step 520, we take all thetime-frequency points for which p(α_(j),δ_(j),ω,τ_(k)) was the bestmatch (that is, where p(α_(j),δ_(j),ω,τ_(k)) has the smallest value, andleave these time-frequency points unaltered while we set to zero (orsome low threshold) all the time-frequency pints in the mixtures forwhich p(α_(j),δ_(j),ω,τ_(k)) was not the smallest. This time-frequencyfiltered version of the mixtures is the estimate of the time-frequencyrepresentation of the first source. We repeat this for each j=1, . . .,N to obtain the N original source estimates. This is the filtering ofEquation (25).

[0068]FIG. 6 is a flow diagram illustrating step 270 (output estimates)of the method of FIG. 2, according to an illustrative embodiment of thepresent invention.

[0069] A dual window function is applied to the estimates of the(original) multiple sources obtained at step 520 of FIG. 5 toreconstruct the multiple sources from the estimates (step 610).

[0070] A description of mixing parameter estimation will now be given,according to an illustrative embodiment of the present invention. Suchdescription will include descriptions of source mixing, sourceassumptions, amplitude-delay estimation, and ML mixing parametergradient search. It is to be appreciated that, for the sake of brevity,definitions of terms appearing in the equations herein below will not berepeated; such definitions have been provided with respect to FIGS. 2-6herein above or are readily ascertainable by one of ordinary skill inthe related art.

[0071] Accordingly, source mixing, which is associated herein withmixing parameter estimation, will first be described, according to anillustrative embodiment of the present invention. Consider measurementsof a pair of sensors where only the direct path is present. In thiscase, without loss of generality, we can absorb the attenuation anddelay parameters of the first mixture, x₁(t), into the definition of thesources. The two mixtures can thus be expressed as, $\begin{matrix}{{{x_{1}(t)} = {\sum\limits_{j = 1}^{n}\quad {s_{j}(t)}}},} & (1) \\{{{x_{2}(t)} = {\sum\limits_{j = 1}^{n}\quad {a_{j}{s_{j}\left( {t - \delta_{j}} \right)}}}},} & (2)\end{matrix}$

[0072] where N is the number of sources, δ_(j) is the arrival delaybetween the sensors resulting from the angle of arrival, and α_(j) is arelative attenuation factor corresponding to the ratio of theattenuations of the paths between sources and sensors. We use Δ todenote the maximal possible delay between sensors, and thus, |δ_(j)|≦Δ,∀j.

[0073] A description of source assumptions, which are associated hereinwith mixing parameter estimation, will now be given according to anillustrative embodiment of the present invention.

[0074] We call two functions s₁(t) and s₂(t) W-disjoint orthogonal if,for a given windowing function W(t), the supports of the windowedFourier transforms of s₁(t) and s₂(t) are disjoint. The windowed Fouriertransform of s_(j)(t) as defined-, $\begin{matrix}{{{{F^{W}\left( {s_{j}( \cdot )} \right)}\left( {\omega,\tau} \right)} = {\int_{- \infty}^{\infty}\quad {{W\left( {t - \tau} \right)}{s_{j}(t)}e^{- {i\omega t}}{t}}}},} & (3)\end{matrix}$

[0075] which we will refer to as S_(j)(ω, τ) where appropriate. TheW-disjoint orthogonality assumption can be stated concisely,

S ₁(ω,τ)S ₂(ω,τ)=0,∀ω,τ.  (4)

[0076] In Appendix A, we introduce the notion of approximate W-disjointorthogonality. When W(t)=1, we use the following property of the Fouriertransform,

F ^(w)(s _(j)(·−δ))(ω,τ)=e ^(−iωδ) F ^(W)(s _(j)(·))(ω,τ)  (5)

[0077] We will assume that (5) holds for all δ,|δ|≦Δ, even when W(t) hasfinite support. This assumption is further described by Balan et al., in“The Influence of Windowing on Time Delay Estimates”, Proceedings of the2000 CISS, Princeton, N.J., Mar. 15-17, 2000.

[0078] A description of amplitude-delay estimation, which is associatedherein with mixing parameter estimation, will now be given according toan illustrative embodiment of the present invention. Using the aboveassumptions, we can write the model from (1) and (2) for the case withtwo array elements as, $\begin{matrix}{{\begin{bmatrix}{X_{1}\left( {\omega,\tau} \right)} \\{X_{2}\left( {\omega,\tau} \right)}\end{bmatrix} = {\begin{bmatrix}1 \\{a_{j}e^{- {i\omega\delta}_{j}}}\end{bmatrix}{S_{j}\left( {\omega,\tau} \right)}}},\quad {{for}\quad {some}\quad {j.}}} & (6)\end{matrix}$

[0079] For W-disjoint orthogonal sources, we note that at most one ofthe N sources will be non-zero for a given (T, I), thus, $\begin{matrix}{{\begin{bmatrix}{X_{1}\left( {\omega,\tau} \right)} \\{X_{2}\left( {\omega,\tau} \right)}\end{bmatrix} = {\begin{bmatrix}1 \\{a_{j}e^{- {i\omega\delta}_{j}}}\end{bmatrix}\quad {S_{j}\left( {\omega,\tau} \right)}}},\quad {{for}\quad {some}\quad {j.}}} & (7)\end{matrix}$

[0080] The original DUET algorithm estimated the mixing parameters byanalyzing the ratio of X₁(ω,τ) and X₂(ω,τ). In light of (7), it is clearthat mixing parameter estimates can be obtained via, $\begin{matrix}{\left( {{a\left( {\omega,\tau} \right)},{\delta \left( {\omega,\tau} \right)}} \right) = \left( {\left| \frac{X_{2}\left( {\omega,\tau} \right)}{X_{1}\left( {\omega,\tau} \right)} \right|,{\frac{1}{\omega}{Im}\left\{ \left( {\ln \frac{X_{1}\left( {\omega,\tau} \right)}{X_{2}\left( {\omega,\tau} \right)}} \right) \right\}}} \right)} & (8)\end{matrix}$

[0081] The original DUET algorithm constructed in 2-D histogram ofamplitude-delay estimates and looked at the number and location of thepeaks in the histogram to determine the number of sources and theirmixing parameters. The 2-D histogram is further described by Jouijine etal., in “Blind Separation of Disjoint Orthogonal Signals: Demixing NSources from 2 Mixtures”, in Proceedings of the 2000 IEEE InternationalConference on Acoustics, Speech, and Signal Processing, Istanbul,Turkey, June 2000, vol. 5, pp. 2985-88.

[0082] A description of a maximum likelihood (ML) mixing parametergradient search, which is associated herein with mixing parameterestimation, will now be given according to an illustrative embodiment ofthe present invention. For the online algorithm, we take a differentapproach. First note that,

|X ₁(ω,τ)α_(j) e ^(−iωδ) ^(_(j)) −X ₂(ω,τ)|²=0,  (9)

[0083] if source j is the active source at time-frequency (ω,τ)Moreover, defining, $\begin{matrix}{{p\left( {a_{j},\delta_{j},\omega,\tau} \right)} = \left. \frac{1}{1 + a_{j}^{2}} \middle| {{{X_{1}\left( {\omega,\tau} \right)}a_{j}e^{- {i\omega\delta}_{j}}} - {X_{2}\left( {\omega,\tau} \right)}} \right|^{2}} & (10)\end{matrix}$

[0084] we can see that, $\begin{matrix}{{{\sum\limits_{\omega}\quad {\min \left( {{p\left( {a_{1},\delta_{1},\omega,\tau} \right)},\ldots,{p\left( {a_{N},\delta_{N},\omega,\tau} \right)}} \right)}} = 0},} & (11)\end{matrix}$

[0085] because at least one p(α_(j),δ_(j),ω,τ) will be zero at eachfrequency. In the Appendix, it is shown that the maximum likelihoodestimates of the mixing parameters satisfy, $\begin{matrix}{\min\limits_{a_{1},\delta_{1},\ldots,a_{N},\delta_{N}}{\sum\limits_{\quad \omega}^{\quad}\quad {\min \quad {\left( {{p\left( {{a_{1}\delta_{1}},\omega,\tau} \right)},\ldots,{p\left( {a_{N},\delta_{N},\omega,\tau} \right)}} \right).}}}} & (12)\end{matrix}$

[0086] We perform gradient descent with (12) as the objective functionto learn the mixing parameters. In order to avoid the discontinuousnature of the minimum function, we approximate it smoothly as follows,$\begin{matrix}{{\min \left( {p_{1},p_{2}} \right)} = \frac{\left. {p_{1} + p_{2} -} \middle| {p_{1} - p_{2}} \right|}{2}} & (13) \\{\approx \frac{p_{1} + p_{2} - {\varphi \left( {p_{1} - p_{2}} \right)}}{2}} & (14) \\{= {\frac{- 1}{\lambda}1{n\left( {e^{- {\lambda p}_{1}} + e^{- {\lambda p}_{2}}} \right)}}} & (15) \\{{where},\quad {{\varphi (x)} = {{\int_{0}^{x}{\frac{1 - e^{\lambda t}}{1 + e^{- {\lambda t}}}\quad {t}}} = {x + {\frac{2}{\lambda}1{n\left( {1 + e^{- \lambda}} \right)}}}}}} & (16)\end{matrix}$

[0087] Generalizing (15), the smooth ML objective function is,$\begin{matrix}{{J(\tau)} = {\min\limits_{{a_{1}\delta_{1}},\ldots,a_{N},\delta}{\sum\limits_{\quad \omega}{{- \frac{1}{\lambda}}1{n\left( {e^{- {{\lambda p}{({a_{1},\delta_{1},\omega,\tau})}}} + \cdots + e^{- {{\lambda p}{({a_{N},\delta_{N},\omega,\tau})}}}} \right)}}}}} & (17)\end{matrix}$

[0088] which has partials, $\begin{matrix}{{\frac{\partial{J(\tau)}}{\partial\delta_{j}} = {\sum\limits_{\omega}\quad {\frac{e^{- {{\lambda p}{({a_{j},\delta_{j},\omega,\tau})}}}}{\sum_{l}\quad e^{- {{\lambda p}{({a_{l},\delta_{l},\omega,\tau})}}}}\frac{{- 2}\omega \quad a_{j}}{1 + a_{j}^{2}}{Im}\left\{ {{X_{1}\left( {\omega,\tau} \right)}\overset{\_}{X_{2}\left( {\omega,\tau} \right)}e^{- {i\omega\delta}_{j}}} \right\}}}}{{and},}} & (18) \\{\frac{\partial{J(\tau)}}{\partial a_{j}} = {\sum_{\omega}{\frac{e^{- {\lambda p}_{j}}}{\sum_{k}e^{- {\lambda pk}}}\frac{2}{\left( {1 + a_{j}^{2}} \right)^{2}}\left( \left( {{\left( {a_{j}^{2} - 1} \right){Re}\left\{ {{X_{1}\left( {\omega,\tau} \right)}\left( \overset{\_}{{X_{2}\left( {\omega,\tau} \right)}e} \right)^{- {iw\delta}_{j}}} \right\}} + {a_{j}\left( \left| {X_{1}\left( {\omega,\tau} \right)} \middle| {}_{2}{+ \left| {X_{2}\left( {\omega,\tau} \right)} \right|^{2}} \right. \right)}} \right) \right.}}} & (19)\end{matrix}$

[0089] We assume we know the number of sources we are searching for andinitialize an amplitude and delay estimate pair to random values foreach source. The estimates (α_(j)[k],δ_(j)[k]) for the current timeτ_(k)=kτ_(Δ) (where τ_(Δ) is the time separating adjacent time windows)are updated based on the previous estimate and the current gradient asfollows, $\begin{matrix}{{a_{j}\lbrack k\rbrack} = {{a_{j}\left\lbrack {k - 1} \right\rbrack} - {{{\beta a}_{j}\lbrack k\rbrack}\frac{\partial{J\left( \tau_{k} \right)}}{\partial a_{j}}}}} & (20) \\{{\delta_{j}\lbrack k\rbrack} = {{\delta_{j}\left\lbrack {k - 1} \right\rbrack} - {{{\beta\delta}_{j}\lbrack k\rbrack}\frac{\partial{J\left( \tau_{k} \right)}}{\partial\delta_{j}}}}} & (21)\end{matrix}$

[0090] where β is a learning rate constant and α_(j)[k] is a time andmixing parameter dependent learning rate for time index k for estimatej. In practice, we have found it helpful to adjust the learning ratedepending on the amount of mixture energy recently explained by thecurrent estimate. We define, $\begin{matrix}{{q_{j}\lbrack k\rbrack} = \left. {\sum\limits_{\omega}\quad \frac{e^{- {{\lambda p}{({a_{j},\delta_{j},\omega,\tau_{k}})}}}}{\sum\limits_{l}\quad e^{- {{\lambda p}{({a_{1},\delta_{l},\omega,\tau})}}}}} \middle| \left. {X_{l}\left( {\omega,\tau_{k}} \right)}||{X_{2}\left( {\omega,\tau_{k}} \right)} \right. \right.} & (22)\end{matrix}$

[0091] and update the parameter dependent learning rate as follows,$\begin{matrix}{{\alpha_{j}\lbrack k\rbrack} = \frac{q_{j}\lbrack k\rbrack}{\sum\limits_{m = 0}^{k}\quad {\gamma^{k - m}{q_{j}\lbrack m\rbrack}}}} & (23)\end{matrix}$

[0092] where γ is a forgetting factor.

[0093] A description of demixing will now be given, according to anillustrative embodiment of the present invention. In order to demix thej^(th) source, we construct a time-frequency mask based on the MLparameter estimator (see (B) in the Appendix), $\begin{matrix}{{\Omega_{j}\left( {\omega,\tau} \right)} = \left\{ \begin{matrix}1 & {{p\left( {a_{j},\delta_{j},\omega,\tau} \right)} \leq {{p\left( {a_{m},\delta_{m},\omega,\tau} \right)}{\forall{m \neq j}}}} \\0 & {\quad {otherwise}}\end{matrix} \right.} & (24)\end{matrix}$

[0094] The estimate for the time-frequency representation of the j^(th)source is,

S _(j)(ω·τ)=Ω_(j)(ω,τ)X ₁(ω,τ)  (25)

[0095] We then reconstruct the source using the appropriate dual windowfunction. The preceding is further described by I. Daubechies, in “TenLectures on Wavelets”, ch. 3, SIAM, Philadelphia, Pa., 1992. In thisway, we demix all the sources by partitioning the time-frequencyrepresentation of one of the mixtures. Note that because the method doesnot invert the mixing matrix, it can demix all sources even when thenumber of sources is greater than the number of mixtures (N>M).

[0096] A description of tests performed with respect to an illustrativeembodiment of the present invention will now be given. We tested themethod on mixtures created in both an anechoic room and an echoic officeenvironment. The algorithm used parameters β=0.02, γ=0.95, λ=10 and aHamming window of size 512 samples (with adjacent windows separated by128 samples) in all the tests. For all tests, the method ran more than 5times faster than real time.

[0097]FIG. 7 is a diagram illustrating a test setup for blind sourceseparation on anechoic data, according to an illustrative embodiment ofthe present invention. Microphones are separated by ˜1.75 cm centeredalong the 180 degree to 0 degree line. The X's show the source locationsused in the anechoic tests. The O's show the locations of the sources inthe echoic tests. Separate recordings at 16 kHz were made of six speechfiles (4 female, 2 male) taken from the TIMIT database played from aloudspeaker placed at the X marks in FIG. 7. Pairwise mixtures were thencreated from all possible voice/angle combinations, excluding same voiceand same angle combinations, yielding a total of 630 mixtures(630=6×5×7×6/2).

[0098] The SNR gains of the demixtures were calculated as follows.Denote the contribution of source j on microphone k as S_(jk)(ω, τ).Thus we have,

X ₁(ω,τ)=S ₁₁(ω,τ)+S ₂₁(ω,τ)  (26)

X ₂(ω,τ)=S ₁₂(ω,τ)+S ₂₂(ω,τ)  (27)

[0099] As we do not know the permutation of the demixing, we calculatethe SNR gain conservatively,${SNR}_{1} = {{\max \left( {{10\log \frac{{{\Omega_{1}S_{11}}}^{2}}{{{\Omega_{1}S_{21}}}^{2}}},{10\log \frac{{{\Omega_{2}S_{12}}}^{2}}{{{\Omega_{2}S_{22}}}^{2}}}} \right)} - {\max \left( {{10\log \frac{{S_{11}}^{2}}{{S_{21}}^{2}}},{10\log \frac{{S_{12}}^{2}}{{S_{22}}^{2}}}} \right)}}$${SNR}_{2} = {{- {\min \left( {{10\log \frac{{{\Omega_{1}S_{11}}}^{2}}{{{\Omega_{1}S_{21}}}^{2}}},{10\log \frac{{{\Omega_{2}S_{12}}}^{2}}{{{\Omega_{2}S_{22}}}^{2}}}} \right)}} + {\min \left( {{10\log \frac{{S_{11}}^{2}}{{S_{21}}^{2}}},{10\log \frac{{S_{12}}^{2}}{{S_{22}}^{2}}}} \right)}}$

[0100] In order to give the method time to learn the mixing parameters,the SNR results do not include the first half second of data.

[0101]FIG. 8 is a diagram illustrating the average SNR gain results foreach angle difference for the anechoic data, according to anillustrative embodiment of the present invention. That is, FIG. 8illustrates a comparison of overall separation SNR gain by angledifference, according to an illustrative embodiment of the presentinvention. As an example, the 60 degree difference results average allthe 10-70, 40-100, 70-130, 100-160, and 130-190 results. Each bar showsthe maximum SNR gain, one standard deviation above the mean, the mean(which is labeled), one standard deviation below the mean, and theminimum SNR gain over all the tests (both SNR₁ and SNR₂) are included inthe averages). The separation results improve as the angle differenceincreases. FIG. 9 is a diagram illustrating the 30 degree differenceresults by angle comparison for the anechoic data, averaging 30 testsper angle comparison, according to an illustrative embodiment of thepresent invention. That is, FIG. 9 illustrates the overall separationSNR gain by 30 degree angle pairing, according to an illustrativeembodiment of the present invention. The performance is a function ofthe delay. That is, the worst performance is achieved for the smallestdelay (corresponding to the 10-40 mixtures), and so forth.

[0102] Recordings were also made in an echoic office with reverberationtime of ˜500 ms, that is, the impulse response of the room fell to −60dB after 500 ms. For the echoic tests, the sources were placed at 0, 90,120, 150, and 180 degrees (see the O's in FIG. 7). FIG. 11 is a diagramillustrating separation results for pairwise mixtures of voices (4female, 4 male), according to an illustrative embodiment of the presentinvention. Separation results for pairwise mixtures of voices (4 female,4 male) and noises (line printer, copy machine, and vacuum cleaner) areshown in FIG. 10. That is, FIG. 10 is a diagram illustrating acomparison of overall separation SNR gain by angle difference, usingechoic office data in a voice versus noise comparison, according to anillustrative embodiment of the present invention. The results areconsiderably worse in the echoic case, which is not surprising as themethod assumes anechoic mixing. However, the method does achieve 5 dBSNR gain on average and is real-time. TABLE 1 AVV EVV EVN Number oftests 630 560 480 Mean SNR gain (dB) 15.31 5.09 4.41 Std SNR gain (dB)5.69 3.34 2.87 Max SNR gain (dB) 25.65 15.18 14.61 Min SNR gain (dB)−0.21 −0.42 −0.50

[0103] Summary results for all three testing groups (anechoic, echoicvoice vs. voice, and echoic voice vs. noise) are shown in the Table 1.In the table, the following designations are employed: AVV=AnechoicVoice vs. Voice; EVV=Echoic Voice vs. Voice; and EVN=Echoic Voice vs.Noise. We have presented a real-time version of the DUET algorithm thatuses gradient descent to learn the anechoic mixing parameters and thendemixes by partitioning the time-frequency representations of themixtures. We have also introduced a measure of W-disjoint orthogonalityand provided empirical evidence for the approximate W-disjointorthogonality of speech signals.

[0104] Appendix A

[0105] Appendix A describes the justification for the W-disjointorthogonality of speech assumption employed herein, according to anillustrative embodiment of the present invention. Clearly, theW-disjoint orthogonality assumption is not exactly satisfied for oursignals of interest. We introduce here a measure of W-disjointorthogonality for a group of sources and show that speech signals areindeed nearly W-disjoint orthogonal to each other. Consider thetime-frequency mask, $\begin{matrix}{{\Phi_{x}^{12}\left( {\omega,\tau} \right)} = \left\{ \begin{matrix}1 & {{{20{\log \left( {{{S_{1}\left( {\omega,\tau} \right)}}/{{S_{2}\left( {\omega,\tau} \right)}}} \right)}}\rangle}x} \\0 & {\quad {otherwise}}\end{matrix} \right.} & (28)\end{matrix}$

[0106] and the resulting energy ratio, ti r(x)=∥Φ_(x) ¹²(ω,τ)S ₁(ω,τ)∥²/∥S ₁(ω,τ)∥²  (29)

[0107] which measures the percentage of energy of source 1 fortime-frequency points where it dominates source 2 by x dB. We proposer(x) as a measure of W-disjoint orthogonality. For example, FIG. 12shows r(x) averaged for pairs of sources used in the demixing tests. Wecan see from the graph that r(3)>0.9 for all three, and thus say thatthe signals used in the tests were 90% W-disjoint orthogonal at 3 dB. Ifwe can correctly map time-frequency points with 3 dB or more singlesource dominance to the correct corresponding output partition, we canrecover the 90% of the energy of the original sources. FIG. 12 alsodemonstrates the W-disjoint orthogonality of six speech signals taken asa group and the fact that independent Gaussian white noise processes areless than 50% W-disjoint orthogonal at all levels.

[0108] Appendix B

[0109] Appendix B describes the ML Estimation for the DUET Modelemployed herein, according to an illustrative embodiment of the presentinvention. Assume a mixing model of type (1)-(2) to which we addmeasurement noise: $\begin{matrix}{{X_{1}\left( {\omega,\tau} \right)} = {{\sum\limits_{j = 1}^{N}\quad {{q_{j}\left( {\omega,\tau} \right)}{S_{j}\left( {\omega,\tau} \right)}}} + {v_{1}\left( {\omega,\tau} \right)}}} & (30) \\{{X_{2}\left( {\omega,\tau} \right)} = {{\sum\limits_{j = 1}^{N}\quad {a_{j}^{{- {\omega\delta}}\quad j}{q_{j}\left( {\omega,\tau} \right)}{S_{j}\left( {\omega,\tau} \right)}}} + {v_{2}\left( {\omega,\tau} \right)}}} & (31)\end{matrix}$

[0110] The ideal model (1)-(2) is obtained in the limit ν₁, ν₂ ? 0. Inpractice, we make the computations assuming the existence of such anoise, and then we pass to the limit. We assume the noise and sourcesignals are Gaussian distributed and independent from one another, withzero mean and known variances: $\left. \begin{bmatrix}{v_{1}\left( {\omega,\tau} \right)} \\{v_{2}\left( {\omega,\tau} \right)}\end{bmatrix} \right.\sim{N\left( {0,{\sigma^{2}I_{2}}} \right)}$

 S _(j)(ω,τ)˜N(0,p _(j)(ω))

[0111] The Bernoulli random variables q_(j)(ω,τ)'s are NOT independent.To accommodate the W-disjoint orthogonality assumption, we require thatfor each (ω,τ) at most one of the q_(j)(ω,τ)'s can be unity, and allothers must be zero. Thus the N-tuple (q₁(ω,τ), . . . ,q_(N)(ω,τ)) takesvalues only in the set

Q={(0, 0, . . . , 0), (1, 0, . . . , 0), . . . , (0, 0, . . . , 1)}

[0112] of cardinality N+1. We assume uniform priors for these R.V.'s.

[0113] The short-time stationarity implies different frequencies aredecorrelated (and hence independent) from one another. We use thisproperty in constructing the likelihood. The likelihood of parameters(α₁, δ₁, . . . , α_(N), δ_(N)) given the data (X₁(ω,τ), X₂(ω,τ)) andspectral powers σ², p_(j)(ω) at a given τ, is given by conditioning withrespect to q_(j)(ω,τ)'s by: $\begin{matrix}{{{L\left( {a_{1},\delta_{1},\ldots \quad,a_{N},{\delta_{N};\tau}} \right)}:={{p\left( {{X_{1}( \cdot )},{{X_{2}( \cdot )}a_{1}},\delta_{1},\ldots \quad,a_{N},{\delta_{N};\tau},\sigma^{2},p_{j}} \right)} = {\underset{\omega \quad}{\prod\quad}\underset{{j = 0}\quad}{\overset{N\quad}{\sum\quad}}\frac{\exp \left\{ {- M} \right\}}{\pi^{2}{\det \left( {{\sigma^{2}I_{2}} + {{p_{j}(\omega)}{\Gamma_{j}(\omega)}}} \right)}}{p\left( {{q_{j}\left( {\omega,\tau} \right)} = 1} \right)}}}}{{where}\text{:}}\quad {M = {\left\lbrack {\overset{\_}{X_{1}\left( {\omega,\tau} \right)}\quad \overset{\_}{X_{2}\left( {\omega,\tau} \right)}} \right\rbrack {\left( {{\sigma^{2}I_{2}} + {{p_{j}(\omega)}{\Gamma_{j}(\omega)}}} \right)^{- 1}\begin{bmatrix}{X_{1}\left( {\omega,\tau} \right)} \\{X_{2}\left( {\omega,\tau} \right)}\end{bmatrix}}\quad {and}}}{\Gamma_{j} = {{\begin{bmatrix}1 \\{a_{j}^{- {\omega\delta}_{j}}}\end{bmatrix}\left\lbrack {1\quad a_{j}^{{\omega\delta}_{j}}} \right\rbrack} = \begin{bmatrix}1 & {a_{j}^{{\omega\delta}_{j}}} \\{a_{j}^{- {\omega\delta}_{j}}} & a_{j}^{2}\end{bmatrix}}}} & (32)\end{matrix}$

[0114] and we have defined q₀(ω,τ)=1−Σ_(k=1) ^(N)q_(k)(ω,τ), p₀(ω)=0,and Γ₀(ω)=I₂ for notational simplicity in (32) in dealing with the casewhen no source is active at a given (ω,τ).

[0115] Next, the Matrix Inversion Lemma (or an explicit computation)gives:$\left. {{{- M} = {{- \frac{1}{\sigma^{2}}}\frac{1}{\sigma^{2} + {{p_{j}(\omega)}\left( {1 + a_{j}^{2}} \right)}}\left( {{{p_{j}(\omega)}{{{a_{j}^{- {\omega\delta}_{j}}{X_{1}\left( {\omega,\tau} \right)}} - {X_{2}\left( {\omega,\tau} \right)}}}^{2}} + {\sigma^{2}\left( {{{,{X_{1}\left( {\omega,\tau} \right)}}}^{2} + {{X_{2}\left( {\omega,\tau} \right)}}^{2}} \right)}} \right)}}{and}{{\det \left( {{\sigma^{2}I_{2}} + {{p_{j}(\omega)}{\Gamma_{j}(\omega)}}} \right)} = {\sigma^{2} + {{p_{j}(\omega)}\left( {1 + a_{j}^{2}} \right)}}}} \right)$

[0116] Now we pass to the limit σ→0. The dominant terms from theprevious two equations are:${- \frac{1}{\sigma^{2}}}\frac{{{{a_{j}^{- {\omega\delta}_{j}}{X_{1}\left( {\omega,\tau} \right)}} - {X_{2}\left( {\omega,\tau} \right)}}}^{2}}{1 + a_{j}^{2}}\quad {and}\quad \sigma^{2}{p_{j}(\omega)}\left( {1 + a_{j}^{2}} \right)$

[0117] Of the N+1 terms in each sum of (32), only one term is dominant,namely the one of the largest exponent. Assume π:ω

{0,1, . . . ,N} is the selection map defined by:

π(ω)=K, if p(α_(k),δ_(k),ω,τ)≦p(α_(j),δ_(j),ω,τ)∀_(j) ≠k

[0118] where:

p(α₀,δ₀,ω,τ)=|X ₁(ω,τ)|² +|X ₂(ω,τ)|²

[0119] and for k ε{1,2, . . . ,N}:${p\left( {a_{k},\delta_{k},\omega,\tau} \right)} = \frac{{{{a_{j}^{- {\omega\delta}_{j}}{X_{1}\left( {\omega,\tau} \right)}} - {X_{2}\left( {\omega,\tau} \right)}}}^{2}}{1 + a_{j}^{2}}$

[0120] Then the likelihood becomes: $\begin{matrix}{{L\left( {a_{1},\delta_{1},\ldots \quad,a_{N},{\delta_{N};\tau}} \right)} = {\frac{C}{\sigma^{2M}}{\prod\limits_{k = 0}^{N}{\prod\limits_{\omega \quad \varepsilon \quad {\pi^{- 1}{(k)}}}{t_{k}\quad \exp \left\{ {- \frac{p\left( {a_{k},\delta_{k},\omega,\tau} \right)}{\sigma^{2}}} \right\}}}}}} & (33)\end{matrix}$

[0121] with M, the number of frequencies and: $t_{k} = \begin{Bmatrix}\frac{1}{\sigma^{2}} & {k = 0} \\\frac{1}{{p_{k}(\omega)}\left( {1 + a_{k}^{2}} \right)} & {k \in \left\{ {1,2,\ldots \quad,N} \right\}}\end{Bmatrix}$

[0122] The dominant term in log-likelihood remains the exponent. Thus:$\begin{matrix}{{\log \quad L} \approx {{- \frac{1}{\sigma^{2}}}{\sum\limits_{k = 0}^{N}{\sum\limits_{\omega \quad \varepsilon \quad {\pi^{- 1}{(k)}}}{p\left( {a_{k},\delta_{k},\omega,\tau} \right)}}}}} & (34)\end{matrix}$

[0123] and maximizing the log-likelihood is equivalent to the following(which is (12)):$\min\limits_{a_{1},\delta_{1},\ldots,a_{N},\delta_{N}}{\sum\limits_{\omega}{\min\left( {{p\left( {a_{1},\delta_{1},\omega,\tau} \right)},\ldots \quad,{p\left( {a_{N},\delta_{N},\omega,\tau} \right)}} \right)}}$

[0124] Although the illustrative embodiments have been described hereinwith reference to the accompanying drawings, it is to be understood thatthe present invention is not limited to those precise embodiments, andthat various other changes and modifications may be affected therein byone of ordinary skill in the related art without departing from thescope or spirit of the invention. All such changes and modifications areintended to be included within the scope of the invention as defined bythe appended claims.

What is claimed is:
 1. A method for blind source separation of multiplesources, comprising: detecting the multiple sources using an array ofsensors to obtain data representative of the multiple sources;representing the data by two mixtures having estimates of amplitude anddelay mixing parameters; updating the estimates of amplitude and delaymixing parameters, comprising the steps of: calculating a plurality oferror measures, each of the plurality of error measures indicating acloseness of the estimates of amplitude and delay mixing parameters fora given source to a given time-frequency point in the two mixtures; andrevising the estimates of amplitude and delay mixing parameters, basedon the plurality of error measures; filtering the two mixtures to obtainestimates of the multiple sources, comprising the steps of: selectingone of the plurality of error measures having a smallest value inrelation to any other of the plurality of error measures, for each of aplurality of time-frequency points in the mixtures; and leavingunaltered any of the plurality of time-frequency points in the mixturesfor which a given one of the plurality of error measures has thesmallest value, while setting to zero any other of the plurality oftime-frequency points in the mixtures for which the given one of theplurality of error measures does not have the smallest value, for eachof the plurality of error measures; outputting the estimates of themultiple sources.
 2. The method of claim 1, wherein said step ofrepresenting the data by two mixtures comprises the step of computing afirst mixture x1 and a second mixture x2 as follows:${{x_{1}(t)} = {\sum\limits_{j = 1}^{N}{s_{j}(t)}}},{{x_{2}(t)} = {\sum\limits_{j = 1}^{N}{a_{j}{s_{j}\left( {t - \delta_{j}} \right)}}}},$

where N is a number of the multiple sources; δ_(j) is an arrival delaybetween the array of sensors resulting from an angle of arrival: α_(j)is a relative attenuation factor corresponding to a ratio ofattenuations of paths between the multiple sources and the array ofsensors; s_(j)(t) is a j^(th) source; j is a source index ranging from 1to N, where N is a number of the multiple sources; and t is a timeargument.
 3. The method of claim 1, wherein said step of calculating theplurality of error measures comprises the step of computing${p\left( {a_{j},\delta_{j},\omega,\tau_{k}} \right)} = \left. \frac{1}{1 + a_{j}^{2}} \middle| {{{X_{1}\left( {\omega,\tau_{k}} \right)}a_{j}^{{- i}\quad \omega \quad \delta_{j}}} - {X_{2}\left( {\omega,\tau_{k}} \right)}} \right|^{2}$

where p(α_(j),δ_(j),ω,τ) is an error measure for a j^(th) source; α_(j)and δ_(j) are current estimates of amplitude and delay mixingparameters, respectively; X₁(ω,τ_(k)) and X₂((ω,τ_(k)) aretime-frequency representations of a first mixture and a second mixtureof the two mixtures, respectively; k is a current time index; τ_(k) is atime argument corresponding to a k^(th) time index; ω is a frequencyargument; and j is a source index ranging from 1 to N, where N is anumber of the multiple sources.
 4. The method of claim 1, wherein saidstep of revising the estimates of amplitude and delay mixing parameterscomprises the step of computing a magnitude and a direction of changefor each of the estimates of amplitude and delay mixing parameters. 5.The method of claim 4, wherein said step of computing the direction ofchange for each of the estimates of amplitude mixing parameterscomprises the step of computing$\frac{\partial{J\left( \tau_{k} \right)}}{\partial a_{j}} = \begin{matrix}{\quad {\sum_{\omega}{\frac{^{{- \lambda}\quad {p{({a_{j},\delta_{j},\omega,\tau_{k}})}}}}{\sum_{l}^{{- \lambda}\quad {p{({a_{l},\delta_{l},\omega,\tau_{k}})}}}}\frac{2}{\left( {1 + a_{j}^{2}} \right)^{2}}}}} \\{\quad \left( \left( {{\left( {a_{j}^{2} - 1} \right)\text{Re}\left\{ {{X_{1}\left( {\omega,\tau_{k}} \right)}\overset{\_}{X_{2}\left( {\omega,\tau_{k}} \right)}e^{{- i}\quad \omega \quad \delta_{j}}} \right\}} +} \right. \right.} \\\left. \quad {a_{j}\left( \left| {X_{1}\left( {\omega,\tau_{k}} \right)} \middle| {}_{2}{+ \left| {X_{2}\left( {\omega,\tau_{k}} \right)} \right|^{2}} \right. \right)} \right)\end{matrix}$

where $\frac{\partial{J\left( \tau_{k} \right)}}{\partial a_{j}}$

represents the magnitude and the direction of change in a currentestimate of amplitude mixing parameter for a j^(th) source that causes alargest change in correspondence between the current estimate and thedata; k is a current time index; τ_(k) is a time argument correspondingto a k^(th) time index; α_(j) is the current estimate of amplitudemixing parameter for the j^(th) source; δ_(j) is a current estimate ofdelay mixing parameter for the j^(th) source; X₁(ω,τ_(k)) andX₂(ω,τ_(k)) are time-frequency representations of a first mixture and asecond mixture of the two mixtures, respectively; ω is a frequencyargument; j and l are source indexes ranging from 1 to N, where N is anumber of the multiple sources; p(α_(j),δ_(j),ω,τ_(k)) is an errormeasure for the j^(th) source; λ is a smoothness parameter; and Re is afunction that returns a real part of a complex number.
 6. The method ofclaim 4, wherein said step of computing the direction of change for eachof the estimates of delay mixing parameters comprises the step ofcomputing$\frac{\partial{J\left( \tau_{k} \right)}}{\partial\delta_{j}} = {\sum\limits_{\omega}{\frac{^{{- \lambda}\quad {p{({a_{j},\delta_{j},\omega,\tau_{k}})}}}}{\sum\limits_{l}^{{- \lambda}\quad {p{({a_{l},\delta_{l},\omega,\tau_{k}})}}}}\frac{{- 2}\quad \omega \quad a_{j}}{1 + a_{j}^{2}}{Im}\left\{ {{X_{1}\left( {\omega,\tau_{k}} \right)}\overset{\_}{X_{2}\left( {\omega,\tau_{k}} \right)}\quad ^{{- i}\quad \omega \quad \delta_{j}}} \right\}}}$

where $\frac{\partial{J\left( \tau_{k} \right)}}{\partial\delta_{j}}$

represents the magnitude and the direction of change in a currentestimate of delay mixing parameter for a j^(th) source that causes alargest change in correspondence between the current estimate and thedata; k is a current time index; τ_(k) is a time argument correspondingto a k^(th) time index; α_(j) is a current estimate of amplitude mixingparameter for the j^(th) source; δ_(j) is the current estimate of delaymixing parameter for the j^(th) source; X₁(ω,τ_(k)) and X₂(ω,τ_(k)) aretime-frequency representations of a first mixture and a second mixtureof the two mixtures, respectively; ω is a frequency argument; j and lare source indexes ranging from 1 to N, where N is a number of themultiple sources; p(α_(j),δ_(j),ω,τ_(k)) is an error measure for thej^(th) source; λ is a smoothness parameter; and Im is a function thatreturns an imaginary part of a complex number.
 7. The method of claim 4,wherein said step of revising the estimates of amplitude and delaymixing parameters comprises the step of computing $\begin{matrix}{{a_{j}\lbrack k\rbrack} = {{a_{j}\left\lbrack {k - 1} \right\rbrack} - {\beta \quad {a_{j}\lbrack k\rbrack}\frac{\partial{J\left( \tau_{k} \right)}}{\partial a_{j}}}}} \\{{\delta_{j}\lbrack k\rbrack} = {{\delta_{j}\left\lbrack {k - 1} \right\rbrack} - {{{\beta\delta}_{j}\lbrack k\rbrack}\frac{\partial{J\left( \tau_{k} \right)}}{\partial\delta_{j}}}}}\end{matrix}$

where a_(j)[k] and δ_(j)[k] are the estimates of amplitude and delaymixing parameters for a j^(th) source at a time index k, respectively; βis a learning rate constant;$\frac{\partial{J\left( \tau_{k} \right)}}{\partial a_{j}}$

represents the magnitude and the direction of change in a currentestimate of amplitude mixing parameter for a j^(th) source that causes alargest change in correspondence between the current estimate and thedata; $\frac{\partial{J\left( \tau_{k} \right)}}{\partial\delta_{j}}$

represents the magnitude and the direction of change in a currentestimate of delay mixing parameter for a j^(th) source that causes alargest change in correspondence between the current estimate and thedata; τ_(k) is a current time; k is a current time index; and j is asource index ranging from 1 to N, wherein N is a number of the multiplesources.
 8. The method of claim 4, wherein said step of computing themagnitude of change for each of the estimates of amplitude and delaymixing parameters comprises the step of scaling the magnitude of changeby a learning rate constant and a variable learning rate.
 9. The methodof claim 8, wherein said step of computing the learning rate constantcomprises the step of computing${q_{j}\lbrack k\rbrack} = {\underset{\omega}{\Sigma}\frac{e^{{- \lambda}\quad {p{({a_{j},\delta_{j},\omega,\tau_{k}})}}}}{\underset{l}{\Sigma}\quad e^{{- \lambda}\quad {p{({a_{l},\delta_{l},\omega,\tau_{k}})}}}}{\left. {X_{1}\left( {\omega,\tau_{k}} \right)}||{X_{2}\left( {\omega,\tau_{k}} \right)} \right.}}$

where q_(j)[k] represents an amount of mixture energy that is defined byestimates of amplitude and delay mixing parameters for a j^(th) source;k is a current time index; τ_(k) is a time argument corresponding to ak^(th) time index; α_(j) and δ_(j) are current estimates of amplitudeand delay mixing parameters, respectively for the j^(th) source;X₁(ω,τ_(k)) and X₂(ω,τ_(k)) are time-frequency representations of afirst mixture and a second mixture of the two mixtures, respectively; ωis a frequency argument; j and l are source indexes ranging from 1 to N,where N is a number of the multiple sources; and λ is a smoothnessparameter.
 10. The method of claim 8, wherein said step of computing thevariable learning rate comprises the step of computing${a_{j}\lbrack k\rbrack} = \frac{q_{j}\lbrack k\rbrack}{\sum\limits_{m = 0}^{k}{\gamma^{k - m}{q_{j}\lbrack m\rbrack}}}$

where α_(j)[k] represents the variable learning rate; q_(j)[k]represents an amount of mixture energy that is defined by estimates ofamplitude and delay mixing parameters for a j^(th) source; γ is aforgetting factor; m is a time index ranging from 0 to a current timeindex k; and j is a source index ranging from 1 to N, where N is anumber of the multiple sources.
 11. The method of claim 1, wherein saidstep of selecting one of the plurality of measures comprises the step ofcomputing${\Omega_{j}\left( {\omega,\tau_{k}} \right)} = \left\{ \begin{matrix}1 & {\quad {{p\left( {a_{j},\delta_{j},\omega,\tau_{k}} \right)} \leq {{p\left( {a_{m},\delta_{m},\omega,\tau_{k}} \right)}{\forall{m \neq j}}}}} \\0 & {\quad {otherwise}}\end{matrix} \right.$

where Ω_(j)(ω,τ_(k)) is a time-frequency mask; p(α_(j),δ_(j),ω,τ_(k)) isan error measure for a j^(th) source; j and m are source indexes rangingfrom 1 to N, where N is a number of the multiple sources; τ_(k) is acurrent time; k is a current time index; α_(j) and δ_(j) are currentestimates of amplitude and delay mixing parameters for the j^(th)source, respectively; and so is a frequency argument.
 12. The method ofclaim 1, wherein said leaving and setting steps comprise the step ofcomputing S _(j)(ω,τ_(k))=Ω_(j)(ω,τ_(k))X ₁(ω,τ_(k)) whereS_(j)(ω,τ_(k)) is an estimate of a time-frequency representation of aj^(th) source; j is a source index ranging from 1 to N, where N is anumber of the multiple sources; Ω_(j)(ω,τ_(k)) is a time-frequency mask;X₁(ω,τ_(k)) is a time-frequency representation of a first mixture of thetwo mixtures; ω is a frequency argument; τ_(k) is a current time; and kis a current time index.
 13. The method of claim 1, wherein said step ofselecting one of the plurality of measures comprise the step ofconstructing a time-frequency mask based on a maximum likelihoodparameter estimation.
 14. The method of claim 13, wherein said step ofconstructing the time-frequency mask comprises the step of computing${\Omega_{j}\left( {\omega,\tau_{k}} \right)} = \left\{ \begin{matrix}1 & {\quad {{p\left( {a_{j},\delta_{j},\omega,\tau_{k}} \right)} \leq {{p\left( {a_{m},\delta_{m},\omega,\tau_{k}} \right)}{\forall{m \neq j}}}}} \\0 & {\quad {otherwise}}\end{matrix} \right.$

where Ω_(j)(ω,τ_(k)) is a time-frequency mask; p(α_(j),δ_(j),ω,τ_(k)) isan error measure for a j^(th) source; j and m are source indexes rangingfrom 1 to N, wherein N is a number of the multiple sources; τ_(k) is acurrent time; k is a current time index; α_(j) and δ_(j) are currentestimates of amplitude and delay mixing parameters for the j^(th)source, respectively; and ω is a frequency argument.
 15. The method ofclaim 1, wherein said leaving and setting steps comprise the step ofcomputing estimates for time-frequency representations of each of themultiple sources.
 16. The method of claim 15, wherein said step ofcomputing the estimates for time-frequency representations comprises thestep of computing an estimate for a j^(th) source of the multiplesources as follows: S _(j)(ω,τ_(k))=Ω_(j)(ω,τ_(k))X ₁(ω,τ_(k)) whereS_(j)(ω,τ_(k)) is an estimate of a time-frequency representation of aj^(th) source; j is a source index ranging from 1 to N, where N is anumber of the multiple sources; Ω_(j)(ω,τ_(k)) is a time-frequency mask;X₁(ω,τ_(k)) is a time-frequency representation of a first mixture of thetwo mixtures; ω is a frequency argument; τ_(k) is a current time; k is acurrent time index.
 17. The method of claim 1, wherein said filteringstep comprises the step of partitioning a time-frequency representationof one of the two mixtures to demix all of the multiple sources.
 18. Themethod of claim 1, wherein said outputting step comprises the step ofreconstructing the multiple sources using a dual window function appliedto the estimates.
 19. The method of claim 1, further comprising the stepof computing a frequency domain representation for the two mixtures,prior to said updating step.
 20. The method of claim 1, furthercomprising the step of computing a time domain representation for theestimates of the multiple sources, prior to said outputting step. 21.The method of claim 1, wherein said method is implemented by a programstorage device readable by machine, tangibly embodying a program ofinstructions executable by the machine to perform said method steps.